89 research outputs found
Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals
Human infants can discover words directly from unsegmented speech signals
without any explicitly labeled data. In this paper, we develop a novel machine
learning method called nonparametric Bayesian double articulation analyzer
(NPB-DAA) that can directly acquire language and acoustic models from observed
continuous speech signals. For this purpose, we propose an integrative
generative model that combines a language model and an acoustic model into a
single generative model called the "hierarchical Dirichlet process hidden
language model" (HDP-HLM). The HDP-HLM is obtained by extending the
hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by
Johnson et al. An inference procedure for the HDP-HLM is derived using the
blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure
enables the simultaneous and direct inference of language and acoustic models
from continuous speech signals. Based on the HDP-HLM and its inference
procedure, we developed a novel double articulation analyzer. By assuming
HDP-HLM as a generative model of observed time series data, and by inferring
latent variables of the model, the method can analyze latent double
articulation structure, i.e., hierarchically organized latent words and
phonemes, of the data in an unsupervised manner. The novel unsupervised double
articulation analyzer is called NPB-DAA.
The NPB-DAA can automatically estimate double articulation structure embedded
in speech signals. We also carried out two evaluation experiments using
synthetic data and actual human continuous speech signals representing Japanese
vowel sequences. In the word acquisition and phoneme categorization tasks, the
NPB-DAA outperformed a conventional double articulation analyzer (DAA) and
baseline automatic speech recognition system whose acoustic model was trained
in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on
Autonomous Mental Development (TAMD
SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model
To realize human-like robot intelligence, a large-scale cognitive
architecture is required for robots to understand the environment through a
variety of sensors with which they are equipped. In this paper, we propose a
novel framework named Serket that enables the construction of a large-scale
generative model and its inference easily by connecting sub-modules to allow
the robots to acquire various capabilities through interaction with their
environments and others. We consider that large-scale cognitive models can be
constructed by connecting smaller fundamental models hierarchically while
maintaining their programmatic independence. Moreover, connected modules are
dependent on each other, and parameters are required to be optimized as a
whole. Conventionally, the equations for parameter estimation have to be
derived and implemented depending on the models. However, it becomes harder to
derive and implement those of a larger scale model. To solve these problems, in
this paper, we propose a method for parameter estimation by communicating the
minimal parameters between various modules while maintaining their programmatic
independence. Therefore, Serket makes it easy to construct large-scale models
and estimate their parameters via the connection of modules. Experimental
results demonstrated that the model can be constructed by connecting modules,
the parameters can be optimized as a whole, and they are comparable with the
original models that we have proposed
Multimodal Hierarchical Dirichlet Process-based Active Perception
In this paper, we propose an active perception method for recognizing object
categories based on the multimodal hierarchical Dirichlet process (MHDP). The
MHDP enables a robot to form object categories using multimodal information,
e.g., visual, auditory, and haptic information, which can be observed by
performing actions on an object. However, performing many actions on a target
object requires a long time. In a real-time scenario, i.e., when the time is
limited, the robot has to determine the set of actions that is most effective
for recognizing a target object. We propose an MHDP-based active perception
method that uses the information gain (IG) maximization criterion and lazy
greedy algorithm. We show that the IG maximization criterion is optimal in the
sense that the criterion is equivalent to a minimization of the expected
Kullback--Leibler divergence between a final recognition state and the
recognition state after the next set of actions. However, a straightforward
calculation of IG is practically impossible. Therefore, we derive an efficient
Monte Carlo approximation method for IG by making use of a property of the
MHDP. We also show that the IG has submodular and non-decreasing properties as
a set function because of the structure of the graphical model of the MHDP.
Therefore, the IG maximization problem is reduced to a submodular maximization
problem. This means that greedy and lazy greedy algorithms are effective and
have a theoretical justification for their performance. We conducted an
experiment using an upper-torso humanoid robot and a second one using synthetic
data. The experimental results show that the method enables the robot to select
a set of actions that allow it to recognize target objects quickly and
accurately. The results support our theoretical outcomes.Comment: submitte
Adiabatic internuclear potentials obtained by energy variation with the internuclear-distance constraint
We propose a method to obtain adiabatic internuclear potentials via energy
variation with the intercluster-distance constraint. The adiabatic O +
O potentials obtained by the proposed method are applied to
investigate the effects of valence neutrons in O + O sub-barrier
fusions. Sub-barrier fusion cross sections of O + O are enhanced
more compared to those of O + O because of distortion of valence
neutrons in O.Comment: 11 pages, 5 figure
Control as Probabilistic Inference as an Emergent Communication Mechanism in Multi-Agent Reinforcement Learning
This paper proposes a generative probabilistic model integrating emergent
communication and multi-agent reinforcement learning. The agents plan their
actions by probabilistic inference, called control as inference, and
communicate using messages that are latent variables and estimated based on the
planned actions. Through these messages, each agent can send information about
its actions and know information about the actions of another agent. Therefore,
the agents change their actions according to the estimated messages to achieve
cooperative tasks. This inference of messages can be considered as
communication, and this procedure can be formulated by the Metropolis-Hasting
naming game. Through experiments in the grid world environment, we show that
the proposed PGM can infer meaningful messages to achieve the cooperative task
Integration of Imitation Learning using GAIL and Reinforcement Learning using Task-achievement Rewards via Probabilistic Graphical Model
Integration of reinforcement learning and imitation learning is an important
problem that has been studied for a long time in the field of intelligent
robotics. Reinforcement learning optimizes policies to maximize the cumulative
reward, whereas imitation learning attempts to extract general knowledge about
the trajectories demonstrated by experts, i.e., demonstrators. Because each of
them has their own drawbacks, methods combining them and compensating for each
set of drawbacks have been explored thus far. However, many of the methods are
heuristic and do not have a solid theoretical basis. In this paper, we present
a new theory for integrating reinforcement and imitation learning by extending
the probabilistic generative model framework for reinforcement learning, {\it
plan by inference}. We develop a new probabilistic graphical model for
reinforcement learning with multiple types of rewards and a probabilistic
graphical model for Markov decision processes with multiple optimality
emissions (pMDP-MO). Furthermore, we demonstrate that the integrated learning
method of reinforcement learning and imitation learning can be formulated as a
probabilistic inference of policies on pMDP-MO by considering the output of the
discriminator in generative adversarial imitation learning as an additional
optimal emission observation. We adapt the generative adversarial imitation
learning and task-achievement reward to our proposed framework, achieving
significantly better performance than agents trained with reinforcement
learning or imitation learning alone. Experiments demonstrate that our
framework successfully integrates imitation and reinforcement learning even
when the number of demonstrators is only a few.Comment: Submitted to Advanced Robotic
- …